A Heuristic Strategy for Learning in Partially Observable and Non-Markovian Domains
نویسندگان
چکیده
Robotic applications are characterized by highly dynamic domains, where the agent has neither full control of the environment nor full observability. In those cases a Markovian model of the domain, able to capture all the aspects that the agent might need to predict, is generally not available or excessively complex. Moreover, robots pose relevant constraints on the amount of experience they can afford, moving the focus of learning their behavior from reaching optimality in the limit, to making the best use of the little information available. We consider the problem of finding the best deterministic policy in a Non-Markovian Decision Process, with a special attention to the sample complexity and the transitional behavior before such a policy is reached. We would like robotic agents to learn in real time while being deployed in the environment, and their behavior to be acceptable even while learning.
منابع مشابه
A Partially Observable Markovian Maintenance Process with Continuous Cost Functions
In this paper a two-state Markovian maintenance process where the true state is unknown will be considered. The operating cost per period is a continuous random variable which depends on the state of the process. If investigation cost is incurred at the beginning of any period, the system wit I be returned to the "in-control" state instantaneously. This problem is solved using the average crite...
متن کاملReinforcement Learning through Global Stochastic Search in N-MDPs
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a wa...
متن کاملLearning Policies with External Memory
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in which the agent’s actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent. In this case, we need to learn a r...
متن کاملEfficient Exploration in Reinforcement Learning with Hidden State
Undoubtedly, efficient exploration is crucial for the success of a learning agent. Previous approaches to exploration in reinforcement learning exclusively address exploration in Markovian domains, i.e. domains in which the state of the environment is fully observable. If the environment is only partially observable, they cease to work because exploration statistics are confounded between alias...
متن کاملHq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning
To solve partially observable Markov decision problems, we introduce HQ-learning, a hierarchical extension of Q-learning. HQ-learning is based on an ordered sequence of subagents, each learning to identify and solve a Markovian subtask of the total task. Each agent learns (1) an appropriate subgoal (though there is no intermediate, external reinforcement for \good" subgoals), and (2) a Markovia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010